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THE USE OF CONSENSUS SEQUENCES FOR TARGETED HOMOLOGOUS 
GENE ISOLATION AND RECOMBINATION IN GENE FAMILIES 



This is a continuing application of United States Application No. 60/070,734, filed December 11, 1997. 

5 FIELD OF THE INVENTION 

The invention relates to compositions and methods for targeting sequence modifications in one or more 
genes of a related family of genes using enhanced homologous recombination techniques. The 
invention also relates to compositions and methods for isolating and identifying novel members of 
homologous sequences families. These techniques may be used to create animal or plant models of 
1 0 disease as well as to identify new targets for drug or pathogen screening. 

BACKGROUND 

Homologous recombination (or general recombination) is defined as the exchange of homologous 
segments anywhere along a length of two DNA molecules. An essential feature of general 
recombination is that the enzymes responsible for the recombination event can presumably use any 
1 5 pair of homologous sequences as substrates, although some types of sequence may be favored over 

others. Both genetic and cylological studies have indicated that such a crossing-over process occurs 
between pairs of homologous chromosomes during meiosis in higher organisms. 

Alternatively, in site-specific recombination, exchange occurs at a specific site, as in the integration of 
phage A into the E. coli chromosome and the excision of A DNA from it. Site-specific recombination 
20 involves specific inverted repeat sequences; e.g. the Cre-loxP and FLP-FRT systems. Within these 

sequences there is only a short stretch of homology necessary for the recombination event, but not 
sufficient for it. The enzymes involved in this event generally cannot recombine other pairs of 
homologous (or nonhomologous) sequences, but act specifically. 
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Although both site-sp cific recombination and homologous recombination are useful mechanisms for 
genetic engin ering of DNA sequences, targeted homologous recombination provides a basis for 
targeting and altering essentially any desired sequence in a duplex DNA molecule, such as targeting a 
DNAs qu nc in a chromosome for replacement by another sequence. Site-specific recombination 
5 has been propos d as one method to integrate transfected DNA at chromosomal locations having 

specific recognition sites (O'Gorman etal. (1991) Science 251 : 1351; Onouchi et al. (1991) Nucleic 
Aptds F^es, 1j): 6373). Unfortunately, since this approach requires the presence of specific target 
sequences and recombinases, its utility for targeting recombination events at any particular 
chromosomal location is severely limited in comparison to targeted general recombination. 

10 Homologous recombination has also been used to create transgenic plants and animals. Transgenic 

organisms contain stably integrated copies of genes or gene constructs derived from another species in 
the chromosome of the transgenic organism. In addition, gene targeted animals can be generated by 
introducing cloned DNA constructs of the foreign genes into totipotent cells by a variety of methods, 
including homologous recombination. For example, animals that develop from genetically altered 

1 5 totipotent cells can contain the foreign gene in all somatic cells and also in germ-line cells. Currently 

methods for producing transgenic and targeted animals have been performed on totipotent embryonic 
stem cells (ES) and with fertilized zygotes. ES cells have an advantage in that large numbers of cells 
can be manipulated easily by homologous recombination in vitro before they are used to generate 
targeted animals. Currently, however, only embryonic stem cells from mice have been shown to 

20 contribute to the germ line. Alternatively, DNA can also be introduced into fertilized oocytes by 

micro-injection into pronuclei which are then transferred into the uterus of a pseudo-pregnant recipient 
animal to develop to term. 

The ability of mammalian and human cells to incorporate exogenous genetic material into genes 
residing on chromosomes has demonstrated that these cells have the general enzymatic machinery for 
25 carrying but homologous recombination required between resident and introduced sequences. These 

targeted recombination events can be used to correct mutations at known sites, replace genes or gene 
segments with defective ones, or introduce foreign genes into cells. 

Traditionally, exogenous sequences transferred into eukaryotic cells undergo homologous 
recombination with homologous endogenous sequences only at very low frequencies, and are so 

30 inefficiently recombined that large numbers of cells must be transfected, selected, and screened in 

order to generate a desired correctly targeted homologous recombinant (Kucherlapati et al. (1984) 
PrQP , N?tl, Acad, Scj, (U . S A ) fil: 3153; Smithies, 0. (19851 Nature 317 : 230; Song et al. (1987) Proc. 
Nqtl, AcacJ, $ci, fU S A) 54: 6820; Doetschman et al. (1987) Nature 330 : 576; Kim and Smithies (1988) 
Nucleic Acids Res 16: 8887; Doetschman et al. (1988) op.cit. : Koller and Smithies (1989) op.cit. : 

35 Shesely et al. (1991) Proc. Natl AcsH p i a a } ««- 42 94; Kim et al. (1991) Gene 103: 227, which 

are incorporated herein by r ference). 
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Several proteins r purified extracts having the property of promoting homologous recombination (i.e., 
recombinase activity) have been identified in prokaryotes and eukaryotes (Cox and Lehman (1987) Ann- 
Rev. Biochem. 56 : 229; Radding, CM. (1 982) op.cit. : Madiraju et al. (1 988) Proc. Natl. Acad. Set. 
(U.S.A/> 85: 6592; McCarthy et al. (1 988) Proc. Natl. Acad. Set. (U.S.A.1 85: 5854; Lopez et al. (1 987) 
5 op.cit . which are incorporated herein by reference). These general recombinases presumably promote 

one or more steps in the formation of homoiogously-paired intermediates, strand-exchange, gene 
conversion, and/or other steps in the process of homologous recombination. 

The frequency of homologous recombination in prokaryotes is significantly enhanced by the presence 
of recombinase activities. Several purified proteins catalyze homologous pairing and/or strand 

10 exchange jp vitro, including: E. colt recA protein, the T4 uvsX protein, the red protein from Ustilago 

maydis, and Rad51 protein from S. cervisiae (Sung et al. t Science 265:1241 (1994)) and human cells 
(Baumann et al., Cell 87:757 (1996)). Additional members of this protein family have been identified by 
homology and function including Rad51 A, B, C, D & E. Dosanjh, et cl., (1998) Nucleic Acid Res. 
26:1 1 79-1 1 84 and dmd . Recombinases and dmel, like the recA protein of E. coli are proteins which 

15 promote strand pairing and exchange. The most studied recombinase to date has been the recA 

recombinase of E. coli, which is involved in homology search and strand exchange reactions (see . Cox 
and Lehman (1987) op.ciU . RecA is required for induction of the SOS repair response, DNA repair, 
and efficient genetic recombination in E. colL RecA can catalyze homologous pairing of a linear duplex 
DNA and a homologous single strand DNA in vitro . In contrast to site-specific recombinases, proteins 

20 like recA which are involved in general recombination recognize and promote pairing of DNA structures 

on the basis of shared homology, as has been shown by several in vitro experiments (Hsieh and 
Camerini-Otero (1989) J. Biol. Chem. 264: 5089; Howard-Flanders et al. (1984) Nature 309 : 215; 
Stasiak et al. (1 984) Cold Spring Harbor Svmp. Quant. Biol. 49: 561 ; Register et a!. (1 987) J. Biol. 
Chem. 262 : 12812). Several investigators have used recA protein in vitro to promote homologously 

25 paired triplex DNA (Cheng et al. (1 988) J. Biol. Chem. 263: 1 51 1 0; Ferrin and Camerini-Otero (1 991) 

Science 354: 1494; Ramdas et al. (1989) J. Biol Chem. 264 : 11395; Strobel et al. (1991) Science 254: 
1639; Hsieh et al. (1990) op.cit. : Rigas et al. (1986) Proc. Natl. Acad. Sci. (U.S.A.) 83: 9591; and 
Camerini-Otero et al. U.S. 7,611,268, which are incorporated herein by reference). 

Recent advances have resulted in techniques allowing enhanced homologous recombination (EHR) 
30 using recombinases such as recA and Rad51 and single-stranded nucleic acids that have sequence 

heterologies. This allows sequence modifications to be specifically targeted to virtually any genomic 
position. See for example, PCT US93/03868 and PCT US98/05223, both of which are expressly 
incorporated herein by reference. 

One area of pressing interest in biology is within the area of "functional genomics", i.e. the correlation of 
35 genotype and phenotype. This requires animal systems, since phenotypic changes must be evaluated 

in vivo. Similarly, and related to this idea, is the elucidation and characterization of gene families, i.e. 
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consensus homology clamp for th family. The method can additionally comprise identifying a target 
cell having a targ ted sequence modification. 

In a further aspect, th invention provides methods of making a non-human organism with a targeted 
sequence modification in at least one member of a gene family. The method comprises introducing 
5 into a cell at least one recombinase and at least two single-stranded targeting polynucleotides which 

are substantially complementary to each other and each having a consensus homology clamp for said 
family. The cell is then subjected to conditions that result in the formation of an animal, and the animal 
has at least one modification in at least one member of a consensus family of genes. 



In an additional aspect, the invention provides methods of isolating a member of a gene family 
10 comprising a protein consensus sequence. The method comprises adding to a complex mixture of 

nucleic acids at least one recombinase and at least two single-stranded targeting polynucleotides which 
are substantially complementary to each other and each having a consensus homology clamp for said 
family. At least one of the targeting polynucleotides comprises a purification tag. The method is done 
under conditions whereby the targeting polynucleotides form a complex with the member, and the 
15 family member is isolated using said purification tag. The complex nucleic acid mixture may be a cDNA 

library, a cell, RNA or a restriction endonucleases genomic digest. 

In a further aspect, the invention provides non-human organisms containing a sequence modification in 
an endogeneous consensus functional domain of a gene member of a gene family. 



BRIEF DESCRIPTION OF THE DRAWINGS 



20 Figures 1 A and 1 B depict a table of protein families and consensus protein motifs. The gene family is a 

family or subfamily with common function or sequence homology used to determine consensus motifs. 
The motif is the amino acid consensus sequence common to the family members, and amino acid 
position is for the first human example. Parenthetical amino acids refers to all residues found at that 
single position within the family. Members refers to the homologous (total and human members) used 

25 to determine consensus sites. The degeneracy refers to the number and length of different 

oligonucleotides needed in one synthesis to code for all the consensus amino acids used. Figure 1C 
shows examples of DNA degeneracy. 

Figure 2 depicts a schematic for gene family member isolation and modification. The degenerate 
probe can be made by several different means including those shown. Libraries or linear nucleic acids 
30 can be used for targeting. Capture can utilize a biotin moiety as shown or others, described in the text 

and known in the art. 



Figure 3 depicts gene family member targeting in animals and cells. 
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genes that can modulate disease states caused by different genes, either genes within the same gene 
family or a completely different gene family. Thus, for example the loss of one type of enzymatic 
activity, resulting in a disease phenotype, may be compensated by alterations in a different but 
homologous enzymatic activity. For example, the effects of the elimination of one kinase in a MAP 
5 kinase cascade can be overcome by another parallel pathway. 

Accordingly, the present invention provides methods and compositions utilizing homology motif tags 
(HMTs) or consensus sequences. By "homology motif tag" or "protein consensus sequence" herein is 
meant an amino acid consensus sequence of a gene family. By "consensus nucleic acid sequence" 
herein is meant a nucleic acid that encodes a consensus protein sequence of a functional domain of a 

10 gene family. In addition, "consensus nucleic acid sequence" can also refer to cis sequences that are 

non-coding but can serve a regulatory or other role. As outlined below, generally a library of consensus 
nucleic acid sequences are used, that comprises a set of degenerate nucleic acids encoding the protein 
consensus sequence. A wide variety of protein consensus sequences for a number of gene families 
are known. A "gene family" therefore is a set of genes that encode proteins that contain a functional 

15 domain for which a consensus sequence can be identified. However, in some instances, a gene family 

includes non-coding sequences; for example, consensus regulatory regions can be identified. For 
example, gene family/consensus sequences pairs are known for the G-protein coupled receptor family, 
the AAA-protein family, the bZIP transcription factor family, the mutS family, the recA family, the Rad51 
family, the dmel family, the recF family, the SH2 domain family, the Bcl-2 family, the single-stranded 

20 binding protein family, the TFIID transcription family, the TGF-beta family, the TNF family, the XPA 

family, the XPG family, actin binding proteins, bromodomain GDP exchange factors, MCM family, 
ser/thr phosphatase family, etc. 

As will be appreciated by those in the art, the proteins of the gene families generally do not contain the 
exact consensus sequences; generally consensus sequences are artificial sequences that represent 

25 the best comparison of a variety of sequences. The actual sequence that corresponds to the functional 

sequence within a particular protein is termed a "consensus functional domain" herein; that is, a 
consensus functional domain is the actual sequence within a protein that corresponds to the consensus 
sequence. A consensus functional domain may also be a "predetermined endogenous DNA 
sequence" (also referred to herein as a "predetermined target sequence") that is a polynucleotide 

30 sequence contained in a target cell. Such sequences can include, for example, chromosomal 

sequences (e.g., structural genes, regulatory sequences including promoters and enhancers, 
recombinatorial hotspots, repeat sequences, integrated proviral sequences, hairpins, palindromes), 
episomal or extrachromosomal sequences (e.g., replicable plasmids or viral replication intermediates) 
including chloroplast and mitochondrial DNA sequences. By "predetermined" or "pre-selected" it is 

35 meant that the consensus functional domain target sequence may be selected at the discretion of the 

practitioner on the basis of known or predicted sequence information, and is not constrained to specific 
sites recognized by certain site-specific recombinases (e.g., FLP recombinase or CRE recombinase). 

7 
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In some mbodiments, the predetermined endogenous DNA target sequence will be other than a 
naturally occurring germline DNA sequence (e.g., a transgene, parasitic, mycoplasmal or viral 
sequenc ). 

In a preferred mbodiment, the gene family is the G-protein coupled receptor family, which has over 
900 identified members, including several subfamilies, in a preferred embodiment, the G-protein 
coupled receptors are from subfamily 1 and are also called R7G proteins. They are an extensive group 
of receptors which recognize hormones, neurotransmitters, odorants and light and transduce 
extracellular signals by interaction with guanine (G) nucleotide-binding proteins. The structure of all 
these receptors is thought to be virtually identical, and they contain seven hydrophobic regions, each of 
which putatively spans the membrane. The N-terminus is extracellular and is frequently glycosylated, 
and the C-terminus is cytoplasmic and generally phosphorylated. Three extracellular loops alternate 
with three cytoplasmic loops to link the seven transmembrane regions. G-protein coupled receptors 
include, but are not limited to: the class A rhodopsin first subfamily, including amine (acetylcholine 
(muscarinic), adrenoceptors, domamine, histamine, serotonin, octopamine), peptides (angiotensin, 
bombesin, bradykinin, C5a anaphylatoxin, Fmet-leu-phe, interleukin-8, chemokine, CCK, endothelin, 
mealnocoftin. neuropeptide Y, neurotensin, opioid, somatostatin, tachykinin, thrombin, vasopressin-like, 
galanin, proteinase activated), hormone proteins (follicle stimulating hormone, lutropin- 
choriogonadotropic hormone, thyrotropin), rhodopsin (vertebrate), olfactory (olfactory type 1-11, 
gustatory), prostanoid (prostaglandin, prostacyclin, thromboxane), nucleotide (adenosine, 
purinoceptors), cannabis, platelet activating factor, gonadotropin-releasing hormone (gonadotropin 
releasing hormone, thyrotropin-releasing hormone, growth hormone secretagogue), melatonin, viral 
proteins, MHC receptor, Mas proto-oncogene, EBV-induced and glucocorticoid induced; the class B 
secretin second subfamily, including calcitonin, corticotropin releasing factor, gastric inhibitory peptide, 
glucagon, growth hormone releasing hormone, parathyroid hormone, secretin, vasoactive intestinal 
polypeptide, and diuretic hormone; the class C metabotropic glutamate third subfamily, including 
metabrotropic glutamate and extracellular calcium-sensing agents; and the class D pheromone fourth 
subfamily. 

Because of the large number of family members, these large classes of GPCRs can be further 
subdivided into subfamilies. Examples of these subfamilies are included in Figures 1A&B where 
metabotropic is from class C; calcitonin, glucagon, vasoactive and parathyroid are from class B; and 
acetylcholine, histamine angiotensin, a2- and (3-adrenergic are from class A. From each subfamily 
small protein consensus sequences can be derived from sequence alignments. For example. Figure 
1 A shows 6 motifs for the metabotripic glutamate like GPRCs derived from the indicated number of 
family members. Figure 1C shows certain examples like the first "EAM (LF) (YFH)" using the single 
letter amino acid code as is known in the art. Using the protein consensus sequence, degenerate 
nucleic acid probes are made to encode the protein consensus sequence, as is generally shown in 
Figur 1 , as is well known in the art. The protein sequence is encoded by DNA triplets which are 
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deduced using standard tables. In some cases additional degeneracy is used to enable production in 
one oligonucleotide synthesis. In many cases motifs were ch sen to minimize degeneracy. The 
examples shown in Figures 1 A-C were designed to facilitate use for amplification of neighboring 
sequences as shown in Figure 2. This can utilize two motifs as indicated by faithful or error prone 
5 amplification. Alternatively outside sequences can be used as is indicated using vector sequence. In 

addition degenerate oligos can be synthesized and used directly in the procedure without amplification. 

As diagramed in Figure 2, these double stranded (ds) DNA probes are denatured and coated with RecA 
or another recombinase such as Rad51 . This material can be used to bind to and allow capture of 
specific clones from cDNA or genomic libraries. Alternatively this material can be introduced into cells 
10 producing transgenic cells or animals with alterations in related family members. 

In addition to the first subfamily of G-protein coupled receptors, there is a second subfamily encoding 
receptors that bind peptide hormones that do not show sequence similarity to the first R7G subfamily. 
All the characterized receptors in this subfamily are coupled to G-proteins that activate both adenylyl 
cyclase and the phosphatidylinositol-calcium pathway. However, they are structurally similar; like 
1 5 classical R7G proteins they putatively contain seven transmembrane regions, a glycosylated 

extracellular N-terminus and a cytoplasmic C-temninus. Known receptors in this subfamily are encoded 
on multiple exons, and several of these genes are alternatively spliced to yield functionally distinct 
products. The N-terminus contains five conserved cysteine residues putatively important in disulfide - 
bonds. Known G-protein coupled receptors in this subfamily are listed above. 

20 In addition to the first and second subfamilies of G-protein coupled receptors, there is a third subfamily 

encoding receptors that bind glutamate and calcium but do not show sequence similarity to either of the 
other subfamilies. Structurally, this subfamily has signal sequences, very large hydrophobic 
extracellular regions of about 540 to 600 amino acids that contain 17 conserved cysteines (putatively 
involved in disulfides), a region of about 250 residues that appear to contain seven transmembrane 

25 domains, and a C-terminal cytoplasmic domain of variable length (50 to 350 residues). Known G- 

protein coupled receptors of this subfamily are listed above. 

In a preferred embodiment, the gene family is the bZIP transcription factor family. This eukaryotic gene 
family encodes DNA binding transcription factors that contain a basic region that mediates sequence 
specific DNA binding, and a leucine zipper, required for dimerization. The bZIP family includes, but is 
30 not limited to, AP-1 , ATF, CREB, CREM, FOS, FRA, GBF, GCN4, HBP, JUN, MET4, OCS1 , OP, TAF1 , 

XBP1,and YBBO. 

In a preferred embodiment, the gene family is involved in DNA mismatch repair, such as mutL, hexB 
and PMS1. Members of this family include, but are not limited to, MLH1, PMS1, PMS2, HexB and MulL. 
The protein consensus sequence is G-F-R-G-E-A-L. 
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the reversal f a DNA segment; and b) recombination between repeat sequences on two DNA 
molecules resulting in their cointegration, or between repeats on one DNA molecule resulting the 
xcision of a DNA fragment Site-sp cificr combination is characterized by a strand exchang 
mechanism that requires no DNA synthesis or high energy cofactor; the phosphodiester bond nergy is 
5 conserved in a phospho-protein linkage during strand cleavage and re-ligation. 



Two unrelated families of recombinases are currently known. The first, called the "phage integrase" 
family, groups a number of bacterial, phage and yeast plasmid enzymes. The second, called the 
"resolvase* family, groups enzymes which share the following structural characteristics: an N-terminal 
catalytic and dimerization domain that contains a conserved serine residue involved in the transient 
10 covalent attachment to DNA, and a C-terminal helix-turn-helix DNA-binding domain. 



In a preferred embodiment, the gene family is the single-stranded binding protein family. The E. coli 
single-stranded binding protein (ssb), also known as the helix-destabilizing protein, is a protein of 177 
amino acids. It binds tightly as a homotetramer to a single-stranded DNA ss-DNA) and plays an 
important role in DNA replication, recombination and repair. Members of the ssb family include, but are 
15 not limited to, E. coli ssb and eukaryotic RPA proteins. 



In a preferred embodiment, the gene family is the TFIID transcription family. Transcription factor TFIID 
(or TATA-binding protein, TBP), is a general factor that plays a major role in the activation of eukaryotic 
genes transcribed by RNA polymerase II. TFIID binds specifically to the TATA box promoter element 
which lies close to the position of transcription initiation. There is a remarkable degree of sequence 
20 conservation of a C-terminal domain of about 180 residues in TFIID from various eukaryotic sources. 

This region is necessary and sufficient for TATA box binding. The most significant structural feature of 
this domain is the presence of two conserved repeats of a 77 amino-acid region. 

In a preferred embodiment, the gene family is the TGF-p family. Transforming growth factor-p (TGF-p) 
is a multifunctional protein that controls proliferation, differentiation and other functions in many cell 
25 types. TGF-P-1 is a protein of 112 amino add residues derived by proteolytic cleavage from the C- 

terminal portion of the precursor protein. Members of the TGF-P family include, but are not limited to, 
the TGF-1 -3 subfamily (including TGF1 . TGF2, and TGF3); the BMP3 subfamily (BM3B, BMP3); the 
BMP5-8 subfamily (BM8A, BMP5, BMP6, BMP7, and BMP8); and the BMP 2 & 4 subfamily (BMP2, 
BMP4, DECA). 



30 Some protein consensus sequences of the TGF-P family are shown in Figure 1 . 

In a preferred embodiment, the gene family is the TNF family. A number of cytokines can be grouped 
into a family on the basis of amino acid sequence, as well as structural and functional similarities. 
These includ (1) tumor necrosis factor (TNF), also known as cachectin or TNF-cc, which is a cytokine 
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properly bind to and position targeting polynucleotides on their homologous targets and 00 the ability of 
recombinase protein/targeting polynucleotide complexes to efficiently find and bind to complementary 
endogenous sequ nces. The best characterized recA protein is from E. coli t in addition to the wild-type 
protein a number of mutant recA proteins have been identified (e.g., recA803; see Madiraju et al. f 
5 PNAS USA 85(18):6592 (1 988); Madiraju et al, Biochem. 31 :1 0529 (1 992); Lavery et al., J. Biol. Chem. 

267:20648 (1992)). Further, many organisms have recA-Jike recombinases with strand-transfer 
activities (e.g., Fugisawa et al., (1 985) Nucl. Acids Res. H: 7473; Hsieh et al., (1 986) Cell 44: 885; 
Hsieh et al., (19891 J. Biol. Chem. 264: 5089; Fishel et a!., (19881 Proc. Natl. Acad. Set. OJSA1 85: 
3683; Cassuto et al., (1987) Mol. Gen. Genet. 208 : 10; Ganea et al., M9871 Mol. Cell Biol. 7: 3124; 

10 Moore et al., (19901J. Biol. Chem. 19: 1 1108; Keene et al., M9841 Nucl. Acids Res. 12: 3057; Kimeic, 

(1 984) Cold Spring Harbor Svmp. 48: 675; Kmeic, (1 986) CM 44: 545; Kolodner et al., (1 987) Proc. 
Natl. Acad. Sci. USA 84: 5560; Sugino et al., (1 985) Proc. Natl. Acad. Sci. USA 85: 3683; Halbrook et 
al.. (1 989) J. Biol. Chem. 264 : 21 403; Eisen et al., (1 988) Proc. Natl. Acad. Sci. USA 85: 7481 ; 
McCarthy et al., (1988> Proc. Natl. Acad. Sci. USA 85 : 5854; Lowenhaupt et al., (1989) J. Biol. Chem. 

15 264 : 20568, which are incorporated herein by reference. Examples of such recombinase proteins 

include, for example but not limited to: recA, recA803, uvsX, and other recA mutants and recA-like 
recombinases (Roca, A. I. (1990) Crit. Rev. Biochem. Molec. Biol. 25: 415), sep1 (Kolodner et al. 
(1987) Proc. Natl. Acad. Sci. (U.S.A.^ 84:5560: Tishkoff et al. Molec. Cell. Biol. 11:25931 RuvC 
(Dunderdale et al. (1991) Nature 354 : 506), DST2, KEM1, XRN1 (Dykstra et al. (1991) Molec. Cell. 

20 BioL 11:2583), STPot/DST1 (Clark et al. M991) Molec. Cell. Biol. 1 1:2576V HPP-1 (Moore et al. (1991) 

Proc. Natl. Acad. Sci. (U.S.A.) 88:9067). other target recombinases (Bishop et al. (1992) Cell 69: 439; 
Shinohara et al. (1992) Cell 69: 457); incorporated herein by reference. RecA may be purified from £ 
co// strains, such as £ coli strains JC12772 and JC15369 (available from A.J. Clark and M. Madiraju, 
University of California-Berkeley, or purchased commercially). These strains contain the recA coding 

25 sequences on a "runaway" replicating plasmid vector present at a high copy numbers per cell. The 

recA803 protein is a high-activity mutant of wild-type recA. The art teaches several examples of 
recombinase proteins, for example, from Drosophila, yeast, plant, human, and non-human mammalian 
cells, including proteins with biological properties similar to recA (i.e., recA-like recombinases), such as 
Rad51, Rad57, dmel from mammals and yeast, and Pk-rec (see Rashid et al., Nucleic Acid Res. 

30 25(4):71 9 (1 997), hereby incorporated by reference). In addition, the recombinase may actually be a 

complex of proteins, i.e. a "recombinosome". In addition, included within the definition of a recombinase 
are portions or fragments of recombinases which retain recombinase biological activity, as well as 
variants or mutants of wild-type recombinases which retain biological activity, such as the E. coli 
recA803 mutant with enhanced recombinase activity. 

35 In a preferred embodiment, recA or rad51 is used. For example, recA protein is typically obtained from 

bacterial strains that overproduce the protein: wild-type £ coli recA protein and mutant recA803 protein 
may be purified from such strains. Alternatively, recA protein can also be purchased from, for example, 
Pharmacia (Piscataway, NJ) or Boehringer Mannheim (Indianapolis, Indiana). 
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By "targeting polynucleotides" herein is meant the polynucleotides used to make alterations in the 
consensus functional domains of members of gene families as d scribed herein. Targeting 
polynucleotides are generally ssDNA or dsDNA, most preferably two complementary single-stranded 
DNAs. 

5 Targeting polynucleotides are generally at least about S to 2000 nucleotides long, preferably about 12 

to 200 nucleotides long, at least about 200 to 500 nucleotides long, more preferably at least about 500 
to 2000 nucleotides long, or longer; however, as the length of a targeting polynucleotide increases 
beyond about 20,000 to 50,000 to 400,000 nucleotides, the efficiency or transferring an intact targeting 
polynucleotide into the cell decreases. The length of homology may be selected at the discretion of the 
10 practitioner on the basis of the sequence composition and complexity of the predetermined endogenous 

target DNA sequence(s) and guidance provided in the art, which generally indicates that 1.3 to 6.8 
kilobase segments of homology are preferred when non-recombinase mediated methods are utilized 
(Hasty et al. (1 991) Molec. Cell. Biol. H: 5586; Shulman et al. (1 990) Molec. Cell. Biol. 10: 4466, which 
are incorporated herein by reference). 

15 Targeting polynucleotides have at least one sequence that substantially corresponds to, or is 

substantially complementary to, a consensus functional domain, i.e. the predetermined endogenous 
DNA sequence (i.e., a DNA sequence of a polynucleotide located in a target cell, such as a 
chromosomal, mitochondrial, chloroplast, viral, extra chromosomal, or mycoplasmal polynucleotide). 
By "corresponds to" herein is meant that a polynucleotide sequence is homologous (i.e., may be similar 

20 or identical, not strictly evolutionary related) to all or a portion of a reference polynucleotide sequence, 

or that a polypeptide sequence is identical to a reference polypeptide sequence. In contradistinction, 
the term "complementary to* is used herein to mean that the complementary sequence can hybridize to 
all or a portion of a reference polynucleotide sequence. Thus, one of the complementary single 
stranded targeting polynucleotides is complementary to one strand of the endogenous target 

25 consensus sequence (i.e. Watson) and corresponds to the other strand of the endogenous target 

consensus sequence (i.e. Crick). Thus, the complementarity between two single-stranded targeting 
polynucleotides need not be perfect. For illustration, the nucleotide sequence TATAC" corresponds to 
a reference sequence "TATAC" and is perfectly complementary to a reference sequence "GTATA". 

The terms "substantially corresponds to" or "substantial identity" or "homologous" as used herein 
30 denotes a characteristic of a nucleic acid sequence, wherein a nucleic acid sequence has at least about 

50 percent sequence identity as compared to a reference sequence, typically at least about 70 percent 
sequence identity, and preferably at least about 85 percent sequence identity as compared to a 
reference sequence. The percentage of sequence identity is calculated excluding small deletions or 
additions which total less than 25 percent of the reference sequence. The reference sequence may be 
35 a subset of a larger sequence, such as a portion of a gene or flanking sequence, or a repetitive portion 

of a chromosome. However, the reference sequence is at least 1 8 nucleotides long, typically at least 
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domain nucleic acid sequence(s). Typically, targeting polynucleotides of the invention have at least one 
consensus homology clamp that is at least about 18 to 35 nucleotides long, and it is preferable that 
consensus homology clamps are at least about 20 to 100 nucleotides long, and more preferably at 
least about 100-500 nucleotides long, although the degree of sequence homology between the 
5 consensus homology clamp and the targeted sequence and the base composition of the targeted 

sequence will determine the optimal and minimal clamp lengths (e.g., G-C rich sequences are typically 
more thermodynamically stable and will generally require shorter clamp length). Therefore, both 
consensus homology clamp length and the degree of sequence homology can only be determined with 
reference to a particular predetermined sequence, but consensus homology clamps generally must be 

10 at least about 1 0 nucleotides long and must also substantially correspond or be substantially 

complementary to a predetermined target sequence. Preferably, a homology clamp is at least about 
10, and preferably at least about 50 nucleotides long and is substantially identical to or complementary 
to a predetermined target sequence. Without wishing to be bound by a particular theory, it is believed 
that the addition of recombinases to a targeting polynucleotide enhances the efficiency of homologous 

15 recombination between homologous, nonisogenic sequences (e.g., between an exon 2 sequence of an 

albumin gene of a Balb/c mouse and a homologous albumin gene exon 2 sequence of a C57/BL6 
mouse), as well as between isogenic sequences. 

The formation of heteroduplex joints is not a stringent process; genetic evidence supports the view that 
the classical phenomena of meiotic gene conversion and aberrant meiotic segregation results in part 

20 from the inclusion of mismatched base pairs in heteroduplex joints, and the subsequent correction of 

some of these mismatched base pairs before replication. Observations on recA protein have provided 
information on parameters that affect the discrimination of relatedness from perfect or near-perfect 
homology and that affect the inclusion of mismatched base pairs in heteroduplex joints. The ability of 
recA protein to drive strand exchange past all single base-pair mismatches and to form extensively 

25 mismatched joints in superhelical DNA reflect its role in recombination and gene conversion. This 

error-prone process may also be related to its role in mutagenesis. RecA-mediated pairing reactions 
involving DNA of (t>X174 and G4, which are about 70 percent homologous, have yielded homologous 
recombinants (Cunningham et al. (1981) Cell 24: 213), although recA preferentially forms homologous 
joints between highly homologous sequences, and is implicated as mediating a homology search 

30 process between an invading DNA strand and a recipient DNA strand, producing relatively stable 

heteroduplexes at regions of high homology. Accordingly, it is the fact that recombinases can drive the 
homologous recombination reaction between strands which are significantly, but not perfectly, 
homologous, which allows gene conversion and the modification of target sequences. Thus, targeting 
polynucleotides may be used to introduce nucleotide substitutions, insertions and deletions into an 

35 endogenous consensus functional domain nucleic acid sequence, and thus the corresponding amino 

acid substitutions, insertions and deletions in proteins expressed from the endogenous consensus 
functional domain nucleic acid sequence. By "endogenous" in this context herein is meant the naturally 
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However, the addition of a second complementary ssDNA to the thre -strand-containing single-D-loop 
stabilizes the deproteinized hybrid joint molecules by allowing W-C base pairing of the probe with the 
displaced target DNA strand. The addition of a second RecA-coated complementary ssDNA (cssDNIA) 
strand to the three-strand containing single D-loop stabilizes deproteinized hybrid joints located away 
5 from the free ends of the duplex target DNA (Sena & Zarling, Nature Genetics 3:365 (1 993); Revet et 

al. J. Moi. Biol. 232:779 (1993); Jayasena and Johnston, J. Mol. Bio. 230:1015 (1993)). The resulting 
four-stranded structure, named a double D-loop by analogy with the three-stranded single D-loop 
hybrid has been shown to be stable in the absence of RecA protein. This stability likely occurs because 
the restoration of W-C basepairing in the parental duplex would require disruption of two W-C 

10 basepairs in the double-D-loop (one W-C pair in each heteroduplex D-loop). Since each base-pairing 

in the reverse transition (double-D-loop to duplex) is less favorable by the energy of one W-C basepair, 
the pair of cssDNA probes are thus kinetically trapped in duplex DNA targets in stable hybrid structures. 
The stability of the double-D loop joint molecule within internally located probe target hybrids is an 
intermediate stage prior to the progression of the homologous recombination reaction to the strand 

15 exchange phase. The double D-loop permits isolation of stable multistranded DNA recombination 

intermediates. 



The invention may also be practiced with individual targeting polynucleotides which do not comprise 
part of a complementary pair. In each case, a targeting polynucleotide is introduced into a target cell 
simultaneously or contemporaneously with a recombinase protein, typically in the form of a 
20 recombinase coated targeting polynucleotide as outlined herein (i.e., a polynucleotide pre-incubated 

with recombinase wherein the recombinase is noncovalently bound to the polynucleotide; generally 
referred to in the art as a nucleoprotein filament). 



The present invention allows for the introduction of alterations in the target nucleic acid consensus 
functional domain of a member of a gene family. That is, the fact that heterologies are tolerated in 

25 targeting polynucleotides allows for two things: first, the use of a heterologous consensus homology 

clamp that may target consensus functional domains of multiple genes, rather than a single gene, 
resulting in a variety of genotypes and phenotypes, and secondly, the introduction of alterations to the 
target sequence. Thus typically, a targeting polynucleotide (or complementary polynucleotide pair) has 
a portion or region having a sequence that is not present in the preselected endogenous targeted 

30 sequence(s) (i.e., a nonhomologous portion or mismatch) which may be as small as a single 

mismatched nucleotide, several mismatches, or may span up to about several kilobases or more of 
nonhomologous sequence. 

Accordingly, in a preferred embodiment, the methods and compositions of the invention are used for 
inactivation of a gene family gene. That is, exogenous targeting polynucleotides can be used to 
35 inactivate, decrease or alter the biological activity of one or more genes in a cell (or transgenic 

19 



BNSDOCID: <WO 9937755A2J_> 



WO 99/37755 



PCT/US98/26498 



nonhuman animal or plant). This finds particular use in the generation of animal models of disease 
states, or in the lucidation of gene function and activity, similar to "knock out" experiments. 
Alternatively, th biological activity of the wild-type gene may be either decreased, or the wild-type 
activity altered to mimic disease states. This includes genetic manipulation of non-coding gene 
5 sequences that aff ct the transcription of genes, including, promoters, repressors, enhancers and 

transcriptional activating sequences. 

Thus in a preferred embodiment, homologous recombination of the targeting polynucleotide and 
endogenous target sequence will result in amino acid substitutions, insertions or deletions in the 
endogenous target sequences, potentially both within the consensus functional domain region and 

1 0 outside of it, for example as a result of the incorporation of PCR tags. This will generally result in 

modulated or altered gene function of the endogenous gene, including both a decrease or elimination 
of function as well as an enhancement of function. Nonhomologous portions are used to make 
insertions, deletions, and/or replacements in a predetermined endogenous targeted DNA sequence, 
and/er to make single or multiple nucleotide substitutions in a predetermined endogenous target DNA 

1 5 sequence so that the resultant recombined sequence (i.e.. a targeted recombinant endogenous 

sequence) incorporates some or all of the sequence information of the nonhomologous portion of the 
targeting polynucleotide^). Thus, the nonhomologous regions are used to make variant sequences, 
i.e. targeted sequence modifications. In this way, site directed modifications may be done in a variety of 
systems for a variety of purposes. 

20 The endogenous target sequence, generally a consensus functional domain, may be disrupted in a 

variety of ways. The term 'disrupt" as used herein comprises a change in the coding or non-coding 
sequence of an endogenous nucleic acid. In one preferred embodiment, a disrupted gene will no 
longer produce a functional gene product In another preferred embodiment, a disrupted gene 
produces a variant gene product Generally, disruption may occur by either the substitution, insertion, 

25 deletion or frame shifting of nucleotides. 

In one embodiment, amino acid substitutions are made. This can be the result of either the 
incorporation of a non-naturally occurring consensus sequence into a consensus target, or of more 
specific changes to a particular sequence outside of the consensus sequence. 

In one embodiment, the endogenous sequence is disrupted by an insertion sequence. The term 
30 "insertion sequence" as used herein means one or more nucleotides which are inserted into an 

endogenous gene to disrupt it. In general, insertion sequences can be as short as 1 nucleotide or as 
long as a gene, as outlined herein. For non-gene insertion sequences, the sequences are at least 1 
nucleotide, with from about 1 to about 50 nucleotides being preferred, and from about 10 to 25 
nucleotides being particularly preferred. An insertion sequence may comprise a polylinker sequence, 
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with from about 1 to about 50 nucleotides being preferred, and from about 10 to 25 nucleotides being 
particularly preferred. Insertion sequence may be a PCR tag used for identification of the first gene. 

In a preferred mbodiment, an insertion sequence comprises a gene which not only disrupts the 
endogenous gene, thus preventing its expression, but also can result in the expression of a new gene 
5 product. Thus, in a preferred embodiment, the disruption of an endogenous gene by an insertion 

sequence gene is done in such a manner to allow the transcription and translation of the insertion gene. 
An insertion sequence that encodes a gene may range from about 50 bp to 5000 bp of cDNA or about 
5000 bp to 50000 bp of genomic DNA. As will be appreciated by those in the art, this can be done in a 
variety of ways. In a preferred embodiment, the insertion gene is targeted to the endogenous gene in 
10 such a manner as to utilize endogenous regulatory sequences, including promoters, enhancers or a 

regulatory sequence. In an alternate embodiment, the insertion sequence gene includes its own 
regulatory sequences, such as a promoter, enhancer or other regulatory sequence etc. 



Particularly preferred insertion sequence genes include, but are not limited to, genes which encode 
selection or reporter proteins. In addition, the insertion sequence genes may be modified or variant 
15 genes. 

The term "deletion" as used herein comprises removal of a portion of the nucleic acid sequence of an 
endogenous gene. Deletions range from about 1 to about 100 nucleotides, with from about 1 to 50 
nucleotides being preferred and from about 1 to about 25 nucleotides being particularly preferred, 
although in some cases deletions may be much larger, and may effectively comprise the removal of the 
20 entire consensus functional domain, the entire endogenous gene and/or its regulatory sequences. 

Deletions may occur in combination with substitutions or modifications to arrive at a final modified 
endogenous gene. 



In a preferred embodiment, endogenous genes may be disrupted simultaneously by an insertion and a 
deletion. For example, some or all of an endogenous gene, with or without its regulatory sequences, 
25 may be removed and replaced with an insertion sequence gene. Thus, for example, all but the 

regulatory sequences of an endogenous gene may be removed, and replaced with an insertion 
sequence gene, which is now under the control of the endogenous gene's regulatory elements. 



The term "regulatory element" is used herein to describe a non-coding sequence which affects the 
transcription or translation of a gene including, but are not limited to, promoter sequences, ribosomal 
30 binding sites, transcriptional start and stop sequences, translational start and stop sequences, enhancer 

or activator sequences, dimerizing sequences, etc. In a preferred embodiment, the regulatory 
sequences include a promoter and transcriptional start and stop sequence. Promoter sequences 
encode either constitutive or inducible promoters. The promoters may be either naturally occurring 
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promoters r hybrid promoters. Hybrid promoters, which combine elements of more than one 
promoter, are also known in th art, and are useful in the present invention. 

In addition, when the targeting polynucleotides are used to generate insertions or deletions in an 
ndogenous nucleic acid sequence, as is described herein, the use of two complementary single- 
stranded targeting polynucleotides allows the use of internal homology clamps as depicted in the 
figures of PCT US98/05223. The use of interna, homology Camps allows the formation of stable 
deproteinized cssDNArprobe target hybrids with homologous DNA sequences containing either 
relatively small or large insertions and deletions within a homologous DNA target Without being bound 
by theory, it appears that these probertarget hybrids, with heterologous inserts in the cssDNA probe ar 
stabbed by the re-annealing of cssDNA probes to each other within the doub.e-D-.oop hybrid, forming 
a nove. DNA structure with an interna, homology clamp. Similariy stable double-D-loop hybrids formed 
at mternal sites with heterologous inserts in the linear DNA targets (with respect to the cssDNA probe) 
are equally stable. Because cssDNA probes are kinetically trapped within the duplex target the 
15 f^rtatd nded lntermed,ateS ° f h °™ ,0 9° us DNA P-*Q ™ stabilized and strand exchange is 

«n a preferred embodiment, the length of the interna, homo.ogy Camp (i.e. the .ength of the insertion or 

deletion) ,s from about 1 to 50?* of the tota. .ength of the targeting polynucleotide, with from about 1 to 
about 200,* bejng preferred gnd from abQut 1 tQ gbout 1 q% bejng especja((y ^ 

some cases the .ength of the de.etion or insertion may be significantly .arger. As for the consensus 
homology clamps, the complementarity within the internal homology clamp need not be perfect 

A targeting polynucleotide used in a method of the invention typically is a single-stranded nucleic acid 
usual,, a DNA strand, or derived by denaturation of a duplex DNA. which is complementary to one (or' 
both) strand(s) of the target duplex nucleic acid. Thus, one of the commentary single stranded 
targeting polynucleotides is complementary to one strand of the endogenous target sequence (i e 
Watson) and the other complementary single stranded targeting polynucleotide is complementary to 
the other strand of the endogenous target sequence (i.e. Crick). The consensus homo.ogy Camp 

^3?"? C ° ntainS ^ ,6aSt 9 °" 95% SeqU6nCe h ° m0,09y ^ the »*« -I""" though 
as ou tlin ed above, less sequence homo.ogy can be to.erated), to insure sequence-specific targeting of 

the targeting polynucleotide to the endogenous DNA consensus target Each single-stranded targeting 
polynucleotide ,s typica.ly about 50-600 bases long. a,though a shorter or .onger polynucleotide may 
also be employed. 7 
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complementary to an endogenous target DNA sequence. Vectors containing a targeting polynucleotide 
sequence are typically grown in E. coli and then isolated using standard molecular biology methods. 
Alternatively, targeting polynucleotides may b prepared in single-stranded form by oligonucleotide 
synthesis methods, which may first require, especially with larger targ ting polynucleotides, formation of 
5 subfragments of the targeting polynucleotide, typically followed by splicing of the subfragments 

together, typically by enzymatic ligation. In general, as will be appreciated by those in the art, targeting 
polynucleotides may be produced by chemical synthesis of oligonucleotides, nick-translation of a 
double-stranded DNA template, polymerase chain-reaction amplification of a sequence (or ligase chain 
reaction amplification), purification of pro kary otic or target cloning vectors harboring a sequence of 

10 interest (e.g., a cloned cDNA or genomic clone, or portion thereof) such as plasmids, phagemids, 

YACs, cosmids, bacteriophage DNA, other viral DNA or replication intermediates, or purified restriction 
fragments thereof, as well as other sources of single and double-stranded polynucleotides having a 
desired nucleotide sequence. When using microinjection procedures it may be preferable to use a 
transfection technique with linearized sequences containing only modified target gene sequence and 

15 without vector or selectable sequences. The modified gene site is such that a homologous 

recombinant between the exogenous targeting polynucleotide and the endogenous DNA target 
sequence can be identified by using carefully chosen primers and PCR, followed by analysis to detect if 
PCR products specific to the desired targeted event are present (Erlich et al., (1 991) Science 252 : 
1643, which is incorporated herein by reference). Several studies have already used PCR to 

20 successfully identify and then clone the desired transfected cell lines (Zimmer and Gruss, (1 989) 

Nature 338: 150; Mouellic et al., (1990^ Proc. Natl. Acad. Sci. USA 87: 4712; Shesely et al., (1991)_ 
Proc. Natl. Acad. Sci. USA 88 : 4294, which are incorporated herein by reference). This approach is 
very effective when the number of cells receiving exogenous targeting polynucleotide^) is high (i.e., 
with microinjection, or with liposomes) and the treated cell populations are allowed to expand to cell 

25 groups of approximately 1 x 10 4 cells (Capecchi, (1989) Science 244 : 1288). When the target gene is 

not on a sex chromosome, or the cells are derived from a female, both alleles of a gene can be 
targeted by sequential inactivation (Mortensen et al., (1991) Proc. Natl. Acad. Sci. USA 88: 7036). 
Alternatively, animals heterologous for the target gene can be bred to homologously as is known in the 
art. 

30 In addition to consensus homology clamps and optional internal homology clamps, the targeting 

polynucleotides of the invention may comprise additional components, such as cell-uptake 
components, chemical substituents, purification tags, etc. 

In a preferred embodiment, at least one of the targeting polynucleotides comprises at least one cell- 
uptake component As used herein, the term "cell-uptake component* refers to an agent which, when 
35 bound, either directly or indirectly, to a targeting polynucleotide, enhances the intracellular uptake of the 

targeting polynucleotide into at least one cell type (e.g., hepatocytes). A targeting polynucleotide of the 
invention may optionally be conjugated, typically by covalently or preferably noncovalent binding, to a 

23 



BNSDOCID: <WO 9937755A2_I_> 





°Dy refe ,B »*r.Cef/». at, ° n sfcnafe m 

Co "iu9ated f„ 9 Po ^«c/e oec/e of , K ' " 2) ' 



24 



WO 99/37755 



PCT/US98/26498 



Cell-uptake components are included with recombinase-coated targeting polynucleotides of the 
invention to nhance the uptake of the recombinase-coated targeting polynucleotide^) into cells, 
particularly for in vivo gene targeting applications, such as gene therapy to treat genetic diseases, 
including neoplasia, and targeted homologous recombination to treat viral infections wherein a viral 
5 sequence (e.g., an integrated hepatitis B virus (HBV) genome or genome fragment) may be targeted by 

homologous sequence targeting and inactivated. Alternatively, a targeting polynucleotide may be 
coated with the cell-uptake component and targeted to cells with a contemporaneous or simultaneous 
administration of a recombinase (e.g., liposomes or immunoliposomes containing a recombinase, a 
viral-based vector encoding and expressing a recombinase). 

10 In addition to recombinase and cellular uptake components, at least one of the targeting 

polynucleotides may include chemical substituents. Exogenous targeting polynucleotides that have 
been modified with appended chemical substituents may be introduced along with recombinase (e.g., 
recA) into a metabolically active target cell to homologously pair with a predetermined endogenous 
DNA target sequence in the cell. In a preferred embodiment, the exogenous targeting polynucleotides 

15 are derivatized, and additional chemical substituents are attached, either during or after polynucleotide 

synthesis, respectively, and are thus localized to a specific endogenous target sequence where they 
produce an alteration or chemical modification to a local DNA sequence. Preferred attached chemical 
substituents include, but are not limited to: cross-linking agents (see Podyminogin et al., Biochem. 
34:13098 (1995) and 35:7267 (1996), both of which are hereby incorporated by reference), nucleic acid 

20 cleavage agents, metal chelates (e.g., iron/EDTA chelate for iron catalyzed cleavage), topoisomerases, 

endonucleases, exonucleases, ligases, phosphodiesterases, photodynamic porphyrins, 
chemotherapeutic drugs (e.g., adriamycin, doxirubicin), intercalating agents, labels, base-modification 
agents, agents which normally bind to nucleic acids such as labels, etc. (see for example Afonina et al., 
PNAS USA 93:3199 (1996), incorporated herein by reference) immunoglobulin chains, and 

25 oligonucleotides. iron/EDTA chelates are particularly preferred chemical substituents where local 

cleavage of a DNA sequence is desired (Hertzberg et al. (1 982) J. Am. Chem. Soc. 104 : 313; 
Hertzberg and Dervan (1984) Biochemistry 23: 3934; Taylor et al. (1984) Tetrahedron 40: 457; Dervan, 
PB ( 1986) Science 232 : 464, which are incorporated herein by reference). Further preferred are 
groups that prevent hybridization of the complementary single stranded nucleic acids to each other but 

30 not to unmodified nucleic acids; see for example Kutryavin et al., Biochem. 35:1 1 170 (1 996) and Woo 

et aL, Nucleic Acid. Res. 24(13):2470 (1996), both of which are incorporated by reference. 2*-0 methyl 
groups are also preferred; see Cole-Strauss et al., Science 273:1386 (1996); Yoon et al., PNAS 
93:2071 (1996)). Additional preferred chemical substituents include labeling moieties, including 
fluorescent labels. Preferred attachment chemistries include: direct linkage, e.g., via an appended 

35 reactive amino group (Corey and Schultz (1988) Science 238 :1401. which is incorporated herein by 

reference) and other direct linkage chemistries, although streptavidin/biotin and 
digoxigenin/antidigoxigenin antibody linkage methods may also be used. Methods for linking chemical 
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RecA protein coating of targeting polynucleotides is normally carried out in a standard 1X RecA coating 
reaction buffer. 10X RecA reaction buffer (i.e., 10x AC buffer) consists of: 100 mM Tris acetate (pH 7.5 
at 37°C), 20 mM magnesium acetate, 500 mM sodium acetate, 10 mM DTT, and 50% glycerol). All of 
the targeting polynucleotides, whether double-stranded or single-stranded, typically ar denatured 
5 before use by heating to 95-1 00°C for five minutes, placed on ice for one minute, and subjected to 

centrifugation (10,000 rpm) at 0°C for approximately 20 seconds (e.g., in a Tomy centrifuge). 
Denatured targeting polynucleotides usually are added immediately to room temperature RecA coating 
reaction buffer mixed with ATPyS and diluted with double-distilled H z O as necessary. 



A reaction mixture typically contains the following components: (i) 0.2-4.8 mM ATPyS; and (ii) between 
10 1-100 ng///l of targeting polynucleotide. To this mixture is added about 1-20 t*\ of recA protein per 

10-100 iA of reaction mixture, usually at about 2-10 mg/ml (purchased from Pharmacia or purified), and 
is rapidly added and mixed. The final reaction volume-for RecA coating of targeting polynucleotide is 
usually in the range of about 1 0-500 iA. RecA coating of targeting polynucleotide is usually initiated by 
incubating targeting polynucleotide-RecA mixtures at 37°C for about 10-15 min. 



15 RecA protein concentrations in coating reactions varies depending upon targeting polynucleotide size 

and the amount of added targeting polynucleotide: recA protein concentrations are typically in the 
range of 5 to 50 >liM. When single-stranded targeting polynucleotides are coated with recA, 
independently of their complementary strands, the concentrations of ATPyS and recA protein may 
optionally be reduced to about one-half of the concentrations used with double-stranded targeting 

20 polynucleotides of the same length: that is, the recA protein and ATPyS concentration ratios are 

generally kept constant for a given concentration of individual polynucleotide strands. 

The coating of targeting polynucleotides with recA protein can be evaluated in a number of ways. First, 
protein binding to DNA can be examined using band-shift gel assays (McEntee et al., (1 981) J. Biol. 
Chem. 256 : 8835). Labeled polynucleotides can be coated with recA protein in the presence of ATPyS 

25 and the products of the coating reactions may be separated by agarose gel electrophoresis. Following 

incubation of recA protein with denatured duplex DNAs the recA protein effectively coats 
single-stranded targeting polynucleotides derived from denaturing a duplex DNA. As the ratio of recA 
protein monomers to nucleotides in the targeting polynucleotide increases from 0, 1 :27, 1 :2.7 to 3.7:1 
for 121-mer and 0, 1:22, 1:2.2 to 4.5:1 for 159-mer, targeting polynucleotide's electrophoretic mobility 

30 decreases, i.e., is retarded, due to recA-binding to the targeting polynucleotide. Retardation of the 

coated polynucleotide's mobility reflects the saturation of targeting polynucleotide with recA protein. An 
excess of recA monomers to DNA nucleotides is required for efficient recA coating of short targeting 
polynucleotides (Leahy et a!., (1986^ J. Biol. Chem. 261 : 954). 



A second method for evaluating protein binding to DNA is in the use of nitrocellulose fiber binding 
35 assays (Leahy et al., (1 986) J. Biol. Chem. 261:6954; Woodbury, et al., (1 983) Biochemistry 



27 



BNSDOCID: <WO 9937755A2_I_> 



WO 99/37755 

PCT/US98/26498 

DNA omtain Mrn ^ ua, "a laoeiea DNA. In the filter binding assay 

conges from free targe** pZ^^Z ~ S ~ " " 

about a few hours) with the m. m , , .f "".ultaneously or contemporaneously (l.e.. within 

a-so be use.. Alternatively, ^^^2^ * ™' 
produced from a homoloaous or h... m , B1!M - For ™V •> 

such as a tmmJZ^JZTZlT* ~ " ' - "»■"" -I 

ES cei, such as AB-„ usTto 2 L 2 , " " Cmb,y ° nal Stem - «•*• ■ 

PfcHpoten, hernaUe^ rrre^rr T""" ^ *" °' ' ~ « ' 

hematop^, of /^J ^^^^I^*" ~ ~— ** 

modulatable promoter such as a„ . JT he,er0, ° 9 °" s «°»«n cassette Includes a 
«~ed promoter J^^^ 

other celMype specific development^' p,omoter - e " h ^'. » "—to sen. promoter, or 

:====—=:—-- - 

receptor present, either naturally or as a cons» u .„ , ^ """""^ homK,ne 

encoding such receptor AtaZZTr " • "^ransfected expression vector 

produce an elevated level of rec 0m hin a « , .u m ° r Ce " S ' the tar 9 et ce » s 

--~magi^^^ 

recombinas. ievels may be elevated bv tr-TT , VH " £ " ,ra, '<"'- Alternatively. 

th,ce». ^^"^""'^'^^odinomerecombinasegeneinto 

---geneZ:,:::^ 

idenbncation of new drug targets- bolh ? 9en °"" C aS as ,n *• 

our anima, models. *JZ?jZ^^*' m ^^"~^«l«~ 

consensus fun<*ona. domains ete C """"° * 



28 



.993775SA2_I_> 



WO 99/37755 



PCT/US98/26498 



In a preferred embodiment, the present invention finds use in the isolation of new members of gene 
families. As is generally depicted in Figure 2, the use of HMT filaments (i.e. consensus homology 
clamps preferably containing a purification tag such as biotin, disoxisenin, or one purification method 
such as the use of a recA antibody), allows the identification of new genes within the gene family. Once 
5 identified, the new genes can be cloned, sequenced and the protein gene products purified. As will b 

appreciated by those in the art, the functional importance of the new genes can be assessed in a 
number of ways, including functional studies on the protein level, as well as the generation of "knock 
out" animal models. By choosing consensus sequences for therapeutically relevant gene families, 
novel targets can be identified that can be used in screening of drug candidates. 



10 Thus, in a preferred embodiment, the present invention provides methods for isolating new members f 

gene families comprising introducing targeting polynucleotides comprising consensus homology clamps 
and at least one purification tag, preferably biotin, to a mix of nucleic acid, such as a plasmid cDNA 
library or a cell, and then utilizing the purification tag to isolate the gene(s). The exact methods will 
depend on the purification tag; a preferred method utilizes the attachment of the binding ligand for the 

15 tag to a bead, which is then used to pull out the sequence. Alternatively anti-recA antibodies could be 

used to capture recA-coated probes. The genes are then cloned, sequenced, and reassembled if 
necessary, as is well known in the art. 



In an alternate preferred embodiment, the present invention finds use in functional genomic studies, by 
providing the creation of transgenic animal models of disease. Thus, for example, HMTs used in 

20 homologous recombination methods can generate animals that have a wide variety of mutations in a 

wide variety of related genes, potentially resulting in a wide variety of phenotypes, including phenotypes 
related to disease states. That is, by targeting a gene family, one, two or multiple genes in the family 
may be altered in any given experiment, thus creating a wide variety of genotypes and phenotypes to 
evaluate. Thus, in a preferred embodiment, the compositions and methods of the invention are used to 

25 generate pools or libraries of variant nucleic acid sequences, wherein the mutations are within the 

consensus functional domain coding region, cellular libraries containing the variant libraries, and 
libraries of animals containing the variant libraries. 



Furthermore, HMT targeting can be used in cells or animals that are diseased or altered; in essence, 
HMT targeting can be done to identify "reversion" genes, genes that can modulate disease states 
30 caused by different genes, either genes within the same gene family or a completely different gene 

family. Thus for example the loss of one type of enzymatic activity, resulting in a disease phenotype, 
may be compensated by alterations in a different but homologous enzymatic activity. 



Accordingly, once the recombinase-targeting polynucleotide compositions are formulated, they are 
introduced or administered into target cells. The administration is typically done as is known for the 
35 administration of nucleic acids into cells, and, as those skilled in the art will appreciate, the methods 

29 



BNSDOCID: <WO 9937755A2J_> 



WO 99/37755 PCT/US98/26498 

may depend on the choice of the target cell. Suitable methods include, but are not limited to. 
microinjection, electroporation. lipofection. etc. By target cells" herein is meant prokaryotic or 

ukaryotic cells. Suitable prokaryotic cells include, but are not limited to. bacteria such as E. coli. 
Bacillus species, and the extremophile bacteria such as thermophiles, halophiles. etc. Preferably, the 
5 procaryotic target cells are recombination competent. Suitable eukaryotic cells include, but are not 

limited to, fungi such as yeast and filamentous fungi, including species of Aspergillus, Trichoderma, and 
Neurospora; plant cells including those of com, sorghum, tobacco, canola. soybean, cotton, tomato, 
potato, alfalfa, sunflower, etc.; and animal cells, including fish, reptiles, amphibia, birds and mammals. 
Suitable fish cells include, but are not limited to, those from species of salmon, trout, tilapia, tuna, carp, 
1 0 flounder, halibut, swordfish, cod and zebrafish. Suitable bird cells include, but are not limited to. those 

of chickens, ducks, quail, pheasants, ostrich, and turkeys, and other jungle foul or game birds. Suitable 
mammalian cells include, but are not limited to. cells from horses, cows, buffalo, deer, sheep, rabbits, 
rodents such as mice, rats, hamsters and guinea pigs, goats, pigs, primates, marine mammals 
including dolphins and whales, as well as cell lines, such as human cell lines of any tissue or stem cell 
type, and stem cells, including pluripotent and non-pluripotent, and non-human zygotes. Particular 
human cells including, but are not limited to. tumor cells of all types (particularly melanoma, myeloid 
leukemia, carcinomas of the lung, breast, ovaries, colon, kidney, prostate, pancreas and testes), 
cardiomyocytes, endothelial cells, epithelial cells, lymphocytes (T-cell and B cell) , mast cells, 
eosinophils, vascular intimal cells, hepatocytes, leukocytes including mononuclear leukocytes, stem 
20 cells such as haemopoetic. neural, skin, lung, kidney, liver and myocyte stem cells, osteoclasts, 

chondrocytes and other connective tissue cells, keratinocytes, melanocytes, liver cells, kidney cells, and 
adipocytes. Suitable cells also include known research cells, including, but not limited to, Jurkat T cells, 
mouse La, HT1080. C127, Rat2, CV-1. NIH3T3 cells. CHO. COS. 293 cells, etc. See the ATCC cell 
line catalog, hereby expressly incorporated by reference. 



15 
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In a preferred embodiment, procaryotic cells are used to identify, clone, or alter gene family members. 
In this embodiment, a pre-selected target DNA sequence is chosen for alteration. Preferably, the pre- 
selected target DNA sequence is contained within an extrachromosomal sequence. By 
"extrachromosomal sequence" herein is meant a sequence separate from the chromosomal or 
genomic sequences. Preferred extrachromosomal sequences include plasmids (particularly procaiyotic 
plasmids such as bacterial plasmids). p1 vectors, viral genomes, yeast, bacterial and mammalian 
artificial chromosomes (YAC, BAC and MAC. respectively), and other autonomously self-replicating 
sequences, although this is not required. As described herein, a recombinase and at least two single 
stranded targeting polynucleotides which are substantially complementary to each other, each of which 
contain a homology clamp to the target sequence contained on the extrachromosomal sequence, are 
added to the extrachromosomal sequence, preferably in vitro. The two single stranded targeting 
polynucleotides are preferably coated with recombinase. and at least one of the targeting 
polynucleotides contain at least one nucleotide substitution, insertion or deletion. The targeting 
polynucleotides then bind to the target sequence in the extrachromosomal sequence to effect 
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homologous recombination and form an altered extrachromosomal sequence which contains the 
substitution, insertion or deletion. The altered extrachromosomal sequence is then introduced into the 
procaryotic cell using techniques known in the art. Preferably, the recombinase is removed prior to 
introduction into the target cell, using techniques known in the art For exampl , the reaction may be 
5 treated with proteases such as proteinase K f detergents such as SDS, and phenol extraction (including 

phenol:chloroform:isoamyl alcohol extraction). These methods may also be used for eukaryotic cells. 

Alternatively, the pre-selected target DNA sequence is a chromosomal sequence. In this embodiment, 
the recombinase with the targeting polynucleotides are introduced into the target cell, preferably 
eukaryotic target cells. In this embodiment, it may be desirable to bind (generally non-covalently) a 
10 nuclear localization signal to the targeting polynucleotides to facilitate localization of the complexes in 

the nucleus. See for example Kido et al. ( Exper. Cell Res. 198:107-114 (1992), hereby expressly 
incorporated by reference. The targeting polynucleotides and the recombinase function to effect 
homologous recombination, resulting in altered chromosomal or genomic sequences. 

In a preferred embodiment, eukaryotic cells are used. For making transgenic non-human animals 
15 (which include homologously targeted non-human animals) embryonal stem cells (ES cells), donor 

cells for nuclear transfer and fertilized zygotes are preferred. In a preferred embodiment, embryonal 
stem cells are used. Murine ES cells, such as AB-1 line grown on mttotically inactive SNL76/7 cell 
feeder layers (McMahon and Bradley. Cell 62 : 1073-1085 (1990)) essentially as described (Robertson, 
E.J. (1987) in Teratocarcinomas and Embryonic Stem Cells: A Practical Approach . E.J. Robertson, ed. 
20 (oxford: IRL Press), p. 71-112; Zjilstra etal., Nature 342 :435-438 (1989); and Schwartzberg et al. t 

Science 246 :799-803 (1989), each of which is incorporated herein by reference) may be used for 
homologous gene targeting. Other suitable ES lines include, but are not limited to, the E14 line 
(Hooper et al. (1987) Nature 326: 292-295), the D3 line (Doetschman et al. (1985) J. EmbivoL Exp. 
Morph. 87: 21-45), and the CCE line (Robertson et al. (1986) Nature 323 : 445-448). The success of 
25 generating a mouse line from ES cells bearing a specific targeted mutation depends on the 

pluripotence of the ES cells (i.e., their ability, once injected into a host blastocyst, to participate in 
embryogenesis and contribute to the germ cells of the resulting animal). 

The pluripotence of any given ES cell line can vary with time in culture and the care with which it has 
been handled. The only definitive assay for pluripotence is to determine whether the specific population 
30 of ES cells to be used for targeting can give rise to chimeras capable of germline transmission of the 

ES genome. For this reason, prior to gene targeting, a portion of the parental population of AB-1 cells 
is injected into C57B1/6J blastocysts to ascertain whether the cells are capable of generating chimeric 
mice with extensive ES cell contribution and whether the majority of these chimeras can transmit the ES 
genome to progeny. 
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Alternatively, the target cells may be screened to identify a cell that contains the targeted consensus 
functional domain sequence modification. This will be done in any number of ways, and will depend on 
the target gene and targeting polynucleotides as will be appreciated by those in the art. The screen 
may be based on phenotypic, biochemical, genotypic, or other functional changes, depending on the 
5 target sequence. For example, IgE levels may be evaluated for inflammation or asthma; vascular tone 

or blood pressure can be evaluated for hypertension, behavior screens can be done for neurologic 
effects, lipoprotein profiles can be screened for cardiovascular effects; secreted molecules can be 
evaluated for endocrine processes; CBCs can be done for hematology studies, etc. In an additional 
embodiment, as will be appreciated by those in the art, selectable markers or marker sequences may 
10 be included in the targeting polynucleotides to facilitate later identification. 



In a preferred embodiment, kits containing the compositions of the invention are provided. The kits 
include the compositions, particularly those of libraries or pools of degenerate cssDNA probes, along 
with any number of reagents or buffers, including recombinases, buffers, salts, ATP, etc. 

The broad scope of this invention is best understood with reference to the following examples, which 
15 are not intended to limit the invention in any manner. All references cited herein are expressly 

incorporated by reference. Although the present invention has been described in some detail by way of 
illustration for purposes of clarity of understanding, it will be apparent that certain changes and 
modifications may be practiced within the scope of the claims. 



EXAMPLES 

20 Example 1 

Calcitonin Type GPCR subfamily 



A Calcitonin type GPCR subfamily serves as an example. The first consensus motif used is TWDGW" 
for which degenerate oligonucleotide "ACNTGGGAYGGNTGG" is synthesized. The second consensus 
motif is "GWGFP" for which antisense degenerate oligonucleotide "NGGRAANCCCCANCC" is 

25 synthesized. The degeneracy of these oligos is 32 and 128 respectively, with each oligo containing a 

Biotin moity at the 5* end. cDNA or a cDNA library is used as a template for PCR amplification using 
described oligonucleotides as primers. The double stranded-amplified product is thermally denatured, 
cooled and coated with RecA as described. A cDNA library is used as substrate for targeting. After 
binding the specific target plasmid and washing away nonspecific sequences the bound material can be 

30 analyzed. Bound plasmids are transformed into E. coli cells with colony PCR performed using the 

original oligonucleotides as primers. This particular example should yield a PCR product of about 600 
base pairs depending on the family member isolated. Other screening procedures can also be used 
including but not limited to hybridization to homologous probes, complementation of cells mutant for a 
family member, etc. Positive colonies (yielding efficient and specific amplification) are further analyzed 

35 by s quence to identify family members. The DNA sequences can then be reverse transcribed by 
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Example 3 
B-adrenergic receptors 



At least three d/sbnet beta adr 
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smooth muscle tone. Thes subtypes, (31 , 32, and p3 adrenergic receptors, share the consensus 
sequence shown in Fig 1 B. While the 03 subtype appears to be primarily expressed in adipose tissue 
where it may regulate metabolic processes, the functional contributions specific to ither the pi or p2 
receptor has proven to be mor difficult to assess, as some tissues express both receptor subtypes and 
5 pharmacological agents used to dissect the relative contributions of different receptors are not always 

subtype-specific. Again, as with a2ARs, knockout systems have greatly increased our knowledge of 
subtype specific effects. Knockout animals have not only allowed assignation of function to individual 
subtypes but also serve as a test for functional redundancy between subtypes. Rohrer et al (1996) 
have shown that the mouse 31 receptor plays a role in development, and regulates the chronotropic 
10 and inotropic responses after administration of agonist. As described for a2-ARs, we can use similar 

phontypic screens to isolate, identify and determine function for members of the p-AR family. 



Example 4 
14-3-3 Proteins 



A fundamental problem in drug discovery for cancer is that model systems are not predictive. Drug 
15 candidates are tested in animals carrying transplanted human tumors (xenografts), but very few drugs 

that show anticancer activity in xenografts have been successful in clinical trials. Furthermore, cancer is 
a polygenic disease; hence, it is difficult to produce transgenic animal models for cancer with single 
gene modifications. 

Most cancers result from defects in DNA repair, cell cycle checkpoint and regulation or cell apoptosis. 

20 Members of the 14-3-3 family are involved in many of these pathways. For instance, 14-3-3 proteins 

are involved in cell cycle control. After DNA damage, 14-3-3 expression is increased by p53, this 
results in the binding of 14-3-3 protein to phosphorylated Cdc25C, which in turn results in the 
dephosphorylization or Cdc2, which finally causes the cell cycle to stop at G2 stage (Hermeking H., 
Molecular Cell . 1997, vol. 1 , 3-1 1). 14-3^3 protein binds the phosphorylated BAD gene product, an 

25 agonist of apoptosis (Zha, J, Cell, 1996, vol. 87, 619-628; Zha J., J. Biol. Chem. . 1997, vol. 272, 24101- 

24104). 14-3-3 proteins also regulate Raf, Cbl and other oncogene activities (Geoffrey, J., Clark, J., 
Biol. Chem . 1997, vol. 272, 20990-20993; Tzivion, G., Nature . 1998, vol. 394, 88-92). In addition, 14-3- 
3 protein expression is increased in bladder squamous cell carcinomas and lung tumor tissues 
(Ostergaard, M M Cancer Res. . 1997, vol. 57, 4111-4117; Nakanishi, K., Hum Antibodies . 1997, vol. 8, 

30 189-94). 



Using 14-3-3 binding domains as a consense probe for HMT targeting, several genes in the 14-3-3 
family can be knocked out or modified at the same time to generate cancer models. In the 14-3-3 gene 
family, the binding sites in 14-3-3 proteins are very conserved between species and various isoforms. 
This conservation is more than 90% at the amino acid level, and more than 70% at DNA sequence level 
35 (Figure 4). Targeting probes designed to substitute two basic amino acids (R, K) with acidic amino 
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CLAIMS 



We claim: 



1. A composition comprising at least one recombinase and at least two single-stranded targeting 
polynucleotides which are substantially complementary to each other and each having a consensus 

5 homology clamp for a gene family. 

2. A composition according to claim 1 comprising at least one recombinase and a plurality of pairs of 
single stranded targeting polynucleotides which are substantially complementary to each other and 
each comprising a consensus homology clamp for a gene family, said plurality of pairs comprising a set 
of degenerate probes encoding the consensus sequence. 



10 3. A composition according to claim 1 or 2 wherein said gene family is selected from the group 

consisting of the G-protein coupled receptor family, the AAA-protein family, the bZIP transcription factor 
family, the mutS family, the recA family, the recF family, the Bcl-2 family, the single-stranded binding 
protein family; the TFIID transcription family, the TGF-beta family, the TNF family, the XPA family, the 
14-3-3 family, and the XPG family. 

15 4. A composition according to claim 1 , 2 or 3 wherein at least one of said polynucleotides further 

comprises an insertion sequence. 



5. A composition according to claim 1 , 2, 3 or 4 wherein at least one of said polynucleotides further 
comprises a purification tag. 

6. A composition according to claim 1 , 2, 3, 4 or 5 wherein said targeting polynucleotides are coated 
20 with recombinase. 



7. A composition according to claim 1, 2, 3, 4, 5 or 6 wherein said recombinase is a species of 
prokaryotic recombinase. 



8. A composition according to claim 1, 2, 3, 4, 5 or 6 wherein said recombinase is a species of 
eukaryotic recombinase. 



25 9. A kit comprising the composition of claim 1 , 2, 3, 4, 5, 6, 7 or 8 and at least one reagent. 

10. A method for targeting a sequence modification in at least one member of a consensus family of 
genes in a cell by homologous recombination, said method comprising introducing into at least one cell 
at least one recombinase and at least two single-stranded targeting polynucleotides which are 
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substantially complementary to each other and ach having a consensus homology clamp for said 
family. 

1 1 . A method according to claim 1 0 further comprising identifying a target cell having a targeted ^ 
s quence modification. 

5 12. A method of making a non-human organism with a targeted sequence modification in at least one 

member of a gene family, said method comprising 

a) introducing into a cell at least one recombinase and at least two single-stranded 
targeting polynucleotides which are substantially complementary to each other and 
each having a consensus homology clamp for said family; and 
10 *>) subjecting said cell to conditions that result in the formation of an animal; 

wherein said animal has at least one modification in at least one member of a consensus family of 
genes. 

1 3. A method according to claim 1 0 f 1 1 or 12 wherein the targeted sequence modification comprises 
the substitution of at least one nucleotide. 

15 14. A method according to claim 1 0, 1 1 , 12 or 1 3 wherein the targeted sequence modification 

comprises a plurality of substitutions. 

15. A method of isolating a member of a gene family comprising a protein consensus sequence, said 
method comprising: 

a) adding to a complex mixture of nucleic acids 
20 0 at least one recombinase; and 

ii) at least two single-stranded targeting polynucleotides which are substantially 
complementary to each other and each having a consensus homology clamp 
for said family, wherein at least one of said targeting polynucleotides comprises 
a purification tag; 

25 under conditions whereby said targeting polynucleotides form a complex with said 

member; and 

b) isolating said member using said purification tag. 

16. A method according to claim 10, 11, 12, 13, 14 or 15 wherein said targeting polynucleotides are 
coated with said recombinase. 

30 17. A method according to claim 10, 1 1, 12, 13, 14, 15 or 16 wherein the recombinase and the 

targeting polynucleotides are introduced simultaneously. 
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18. A method according to claim 10, 1 1, 12, 13, 14, 15, 16 or 17 wherein said cell is a eukaryotic cell. 

19. A method according to claim 10, 11, 12, 13, 14, 15, 16 or 17 wherein said cell is a procaryotic cell. 

20. A method according to claim 10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 wherein said cell is from an 
organism with a genotypic disease state. 

5 21 . A method according to claim 15 wherein said complex mixture is a cDNA library or a cell. 

22. A non-human organism containing a sequence modification in an endogeneous consensus 
functional domain of a gene member of a gene family. 
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